🧩 Visualization - Modular Functions for Impact Analysis Part II¶
Modular Functions for Impact Analysis & Visualization
This section provides reusable, parameterized functions for analyzing and visualizing performance metrics across temporal and categorical dimensions. Designed for flexibility and clarity, the functions support:
Dynamic grouping by time (year, month_name, day_name) or category (store, promo, etc.)
Preprocessing filters to exclude non-operational records (e.g., closed stores, zero-sales days)
Statistical summaries including mean, standard deviation, and count
Ranked insights with volatility and performance differentials
Interactive visualizations via Plotly for enhanced interpretability
These tools enable scalable impact assessments and trend analyses across diverse datasets with minimal code repetition.
1. Setup & Imports Libraries¶
import time
from datetime import datetime
# Step 1: Setup & Imports Libraries
print("Step 1: Setup and Import Libraries started...")
time.sleep(1) # Simulate processing time
Step 1: Setup and Import Libraries started...
# Data Manipulation & Processing
import os
import sys
import math
import numpy as np
import pandas as pd
# Warnings
import warnings
warnings.simplefilter('ignore')
🧩 Import Modular Functions¶
# Add the main project directory to path (go up 2 levels)
project_root = os.path.abspath('../../')
if project_root not in sys.path:
sys.path.insert(0, project_root)
# Now import from scripts (since scripts/ has __init__.py, treat it as a package)
from scripts.viz_top10_stores import analyze_top_performers
from scripts.viz_temporal_trends import analyze_temporal_trends
from scripts.viz_holiday_impact import analyze_stateholiday_impact
from scripts.viz_promo_impact import analyze_promotion_impact
print("="*60)
print("Rossman Store Sales Time Series Analysis - Part 2")
print("="*60)
print("All libraries and modules imported successfully!")
print("Analysis Date:", pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S'))
============================================================ Rossman Store Sales Time Series Analysis - Part 2 ============================================================ All libraries and modules imported successfully! Analysis Date: 2025-08-16 01:00:53
print("✅ Setup and Import Liraries completed.\n")
✅ Setup and Import Liraries completed.
# Start Impact Analysis
viz_impact_analysis_begin = pd.Timestamp.now()
bold_start = '\033[1m'
bold_end = '\033[0m'
print("🔍 Viz impact Analysis Started ...")
print(f"🟢 Begin Date: {bold_start}{viz_impact_analysis_begin.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}\n")
🔍 Viz impact Analysis Started ...
🟢 Begin Date: 2025-08-16 01:00:53
Restore the viz_dataset & import modular functions¶
%store -r df_viz_feat
Top 5 performing days¶
# Run the analysis
analyze_top_performers(df_viz_feat, 'day', 'sales', 5)
Top 5 Day Performance Analysis: ======================================================= Rank Day Average % of #1 ------------------------------------------------------- 1 Mon € 8,217 100.0% 2 Sun € 8,205 99.8% 3 Tue € 7,091 86.3% 4 Fri € 7,066 86.0% 5 Thu € 6,756 82.2% Summary Statistics: Total days analyzed: 7 Top 5 average: €7,467 Overall average: €7,134 Top 5 outperform by: 4.7%
day Mon 8217.443946 Sun 8204.634815 Tue 7090.987556 Fri 7066.366868 Thu 6756.031605 Name: sales, dtype: float64
Yearly sales trends¶
# Run the analysis
analyze_temporal_trends(df_viz_feat, 'year', 'sales')
Year Performance Analysis: ================================================== Rank Year Average Std Dev Count ------------------------------------------------------------ 3 2015.0 € 7,098 € 3,051 165,841 2 2014.0 € 7,026 € 3,129 310,385 1 2013.0 € 6,815 € 3,115 337,924 Key Insights: Best year: 2015.0 (€7,098) Worst year: 2013.0 (€6,815) Performance range: €283 Volatility: 4.1%
| year | avg | std | count | |
|---|---|---|---|---|
| 2 | 2015 | 7098.0 | 3051.0 | 165841 |
| 1 | 2014 | 7026.0 | 3129.0 | 310385 |
| 0 | 2013 | 6815.0 | 3115.0 | 337924 |
Promotion impact on customers by month¶
# Run the analysis
analyze_promotion_impact(df_viz_feat, 'customers', 'month')
Promotion Impact Analysis - Customers by Month: ============================================================ Apr : No Promo 698 | Promo 863 | Lift +23.7% Aug : No Promo 684 | Promo 838 | Lift +22.6% Dec : No Promo 810 | Promo 998 | Lift +23.2% Feb : No Promo 679 | Promo 805 | Lift +18.5% Jan : No Promo 661 | Promo 799 | Lift +20.9% Jul : No Promo 675 | Promo 855 | Lift +26.7% Jun : No Promo 685 | Promo 858 | Lift +25.3% Mar : No Promo 681 | Promo 845 | Lift +24.1% May : No Promo 728 | Promo 842 | Lift +15.6% Nov : No Promo 723 | Promo 845 | Lift +17.0% Oct : No Promo 705 | Promo 819 | Lift +16.2% Sep : No Promo 679 | Promo 835 | Lift +22.9% Overall Impact: Average lift from promotions: +21.3% Additional revenue per day: 148
| month | promo | customers | |
|---|---|---|---|
| 0 | Apr | No Promo | 697.811024 |
| 1 | Apr | Promo | 862.927496 |
| 2 | Aug | No Promo | 683.697851 |
| 3 | Aug | Promo | 838.389713 |
| 4 | Dec | No Promo | 810.091292 |
| 5 | Dec | Promo | 998.239317 |
| 6 | Feb | No Promo | 679.252606 |
| 7 | Feb | Promo | 805.110211 |
| 8 | Jan | No Promo | 661.389975 |
| 9 | Jan | Promo | 799.341393 |
| 10 | Jul | No Promo | 675.142099 |
| 11 | Jul | Promo | 855.104890 |
| 12 | Jun | No Promo | 685.067610 |
| 13 | Jun | Promo | 858.470969 |
| 14 | Mar | No Promo | 680.716898 |
| 15 | Mar | Promo | 844.620730 |
| 16 | May | No Promo | 728.140178 |
| 17 | May | Promo | 841.680937 |
| 18 | Nov | No Promo | 722.733964 |
| 19 | Nov | Promo | 845.477052 |
| 20 | Oct | No Promo | 704.588344 |
| 21 | Oct | Promo | 818.751183 |
| 22 | Sep | No Promo | 679.458912 |
| 23 | Sep | Promo | 834.975067 |
Holiday impact on customers by month¶
# Run the analysis
analyze_stateholiday_impact(df_viz_feat, 'customers', 'month')
State Holiday Impact Analysis - Customers by Month: ====================================================================== Overall Holiday Impact: ------------------------- Public : 1,279 (+67.4% vs regular) Easter : 1,687 (+120.8% vs regular) Christmas : 1,569 (+105.4% vs regular) Regular Days: € 764 (baseline) Holiday Impact by Month: ----------------------------------- Apr : Regular 774 | Holiday 777 | Impact +0.5% Aug : Regular 750 | Holiday 755 | Impact +0.6% Dec : Regular 886 | Holiday 1,569 | Impact +77.2% Feb : Regular 730 | Holiday 743 | Impact +1.8% Jan : Regular 722 | Holiday 1,249 | Impact +73.0% Jul : Regular 761 | No holiday data Jun : Regular 758 | Holiday 1,096 | Impact +44.5% Mar : Regular 768 | Holiday 742 | Impact -3.4% May : Regular 777 | Holiday 1,418 | Impact +82.6% Nov : Regular 782 | Holiday 2,578 | Impact +229.5% Oct : Regular 752 | Holiday 1,212 | Impact +61.3% Sep : Regular 746 | No holiday data Store Operations Impact: ------------------------- Total store closures: 168,492 (17.1%) Holiday closures: 39,994 Regular closures: 128,498
| month | stateholiday | customers | |
|---|---|---|---|
| 0 | Apr | Easter | 1618.790698 |
| 1 | Apr | Normal Day | 773.629811 |
| 2 | Aug | Normal Day | 750.410150 |
| 3 | Aug | Public | 754.884615 |
| 4 | Dec | Christmas | 1569.225352 |
| 5 | Dec | Normal Day | 885.667303 |
| 6 | Feb | Normal Day | 729.541412 |
| 7 | Jan | Normal Day | 722.209148 |
| 8 | Jan | Public | 1249.491803 |
| 9 | Jul | Normal Day | 761.380498 |
| 10 | Jun | Normal Day | 758.481370 |
| 11 | Jun | Public | 1095.966019 |
| 12 | Mar | Easter | 2235.937500 |
| 13 | Mar | Normal Day | 768.046634 |
| 14 | May | Normal Day | 776.910586 |
| 15 | May | Public | 1418.377483 |
| 16 | Nov | Normal Day | 782.217463 |
| 17 | Nov | Public | 2577.615385 |
| 18 | Oct | Normal Day | 751.844338 |
| 19 | Oct | Public | 1212.465116 |
| 20 | Sep | Normal Day | 745.742245 |
print("✅ Data Visualization Impact Analysis completed.\n")
✅ Data Visualization Impact Analysis completed.
print("✅ Features Engineering and Data Visualization (I) completed successfully!")
print(f"🗓️ Analysis Date: {bold_start}{pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
✅ Features Engineering and Data Visualization (I) completed successfully!
🗓️ Analysis Date: 2025-08-16 01:01:01
🌟 Advantages¶
Reusable across 'year', 'month', 'dayofweek', etc.
Easy to change aggregation type ('sum', 'median', etc.)
Consistent naming and sorting
Makes your code far more modular for dashboards or reporting
Why Reusability Matters¶
- 💡 Scalability: You can plug your functions into larger pipelines or production environments without rewrites.
- 🛠️ Maintainability: A bug fix in one utility can instantly improve multiple workflows.
- 🚀 Efficiency: Spend less time rewriting logic and more time interpreting results.
Why This Matters for Rossmann Store Sales¶
- We’ll likely repeat the same aggregations or visualizations across hundreds of stores.
- Promos, holidays, and weekday patterns demand consistent filtering and analysis.
- Modular functions help you prototype insights fast, scale across stores, and iterate smoothly.
# End analysis
viz_impact_analysis_end = pd.Timestamp.now()
duration = viz_impact_analysis_end - viz_impact_analysis_begin
# Final summary print
print("\n📋 Features Engineering && Data Viz Summary")
print(f"🟢 Begin Date: {bold_start}{viz_impact_analysis_begin.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
print(f"✅ End Date: {bold_start}{viz_impact_analysis_end.strftime('%Y-%m-%d %H:%M:%S')}{bold_end}")
print(f"⏱️ Duration: {bold_start}{str(duration)}{bold_end}")
📋 Features Engineering && Data Viz Summary 🟢 Begin Date: 2025-08-16 01:00:53 ✅ End Date: 2025-08-16 01:01:01 ⏱️ Duration: 0 days 00:00:07.717127
Project Design Rationale: Notebook Separation¶
To promote clarity, maintainability, and scalability within the project, data engineering and visualization tasks are intentionally separated into distinct notebooks. This modular approach prevents the accumulation of excessive code in a single notebook, making it easier to debug, update, and collaborate across different stages of the workflow. By isolating data transformation logic from visual analysis, each notebook remains focused and purpose-driven, ultimately enhancing the overall efficiency and readability of the project.